Apache FlinkTM: Stream and Batch Processing in a Single Engine
نویسندگان
چکیده
Apache Flink1 is an open-source system for processing streaming and batch data. Flink is built on the philosophy that many classes of data processing applications, including real-time analytics, continuous data pipelines, historic data processing (batch), and iterative algorithms (machine learning, graph analysis) can be expressed and executed as pipelined fault-tolerant dataflows. In this paper, we present Flink’s architecture and expand on how a (seemingly diverse) set of use cases can be unified under a single execution model.
منابع مشابه
In-Stream Big Data Processing
The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that realtime query processing and in-stream processing is the immediate need in many practical applications. In recent years, this idea got a lot of traction and a whole bunch of solutions like Twitter’s Storm, Yahoo’s S4, Cloudera’s Impala, A...
متن کاملReproducible Experiments for Comparing Apache Flink and Apache Spark on Public Clouds
Big data processing is a hot topic in today’s computer science world. There is a significant demand for analysing big data to satisfy many requirements of many industries. Emergence of the Kappa architecture created a strong requirement for a highly capable and efficient data processing engine. Therefore data processing engines such as Apache Flink and Apache Spark emerged in open source world ...
متن کاملArchitectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study
While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We ...
متن کاملA comparison on scalability for batch big data processing on Apache Spark and Apache Flink
*Correspondence: [email protected] 1Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Calle Periodista Daniel Saucedo Aranda, 18071 Granada, Spain Full list of author information is available at the end of the article Abstract The large amounts of data have created a need for new fram...
متن کاملAn improved memetic algorithm to minimize earliness–tardiness on a single batch processing machine
In this research, a single batch processing machine scheduling problem with minimization of total earliness and tardiness as the objective function is investigated.We first formulate the problem as a mixed integer linear programming model. Since the research problem is shown to be NP-hard, an improved memetic algorithmis proposed to efficiently solve the problem. To further enhance the memetic ...
متن کامل